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1  Statement  of  the  Problem 

Our  research  aims  to  develop  a  framework  for  adaptive  and  parallel  computation  [30,  31] 
on  geometrically  complex  regions.  In  particular,  we  investigated  dynamic  load  balancing, 
transient  solution  techniques,  and  error  estimation  procedures  for  adaptive  computation. 
Load  balancing  includes  geometrically-  and  topologically-based  procedures  that  are  suit¬ 
able  for  heterogeneous  computation  involving  p-  and  hp-refinement,  time  dependence 
including  local  time  stepping  and  method  orders,  diverse  computing  systems  {e.g.,  clus¬ 
ters  of  workstations),  and  hierarchical  networks  {e.g.,  networks  of  SMPs).  Appropriate 
time  integration  techniques  include  explicit  methods  and  implicit  one-  and  multi-step 
methods.  The  explicit  methods  are  useful  for  problems  having  very  rapid  dynamics. 
Implicit  multistep  methods  are  generally  more  efficient  than  one-step  methods;  however, 
this  need  not  be  the  case  when  local  time  steps  and  method  orders  are  used.  A  pos¬ 
teriori  error  estimation  focus  on  procedures  for  transient  problems  with  emphasis  on 
singularly-perturbed  parabolic  and  hyperbolic  problems,  high-order  methods  involving 
p-  and  /ip-refinement,  and  the  coordination  of  spatial  and  temporal  errors. 
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2  Summary  of  Important  Results 

2.1  Geometry-Based  Object-Oriented  Simulation  Framework 

To  date,  consideration  of  object-oriented  programming  in  simulation  software  has  focused 
on  flexible  structures  with  code  reuse,  application  of  symbolic  computing,  operating  in 
parallel,  linking  with  design  processes,  and  supporting  interacting  multiphysics  simula¬ 
tions.  Building  on  these  efforts  and  the  needs  of  adaptive  simulation  technologies,  we 
have  constructed  a  geometry-based  simulation  frameworks  that  supports  parallel  adap¬ 
tive  simulation  capabilities.  This  system,  referred  to  as  Trellis  is  based  on  [6,  5]: 

•  A  set  of  geometry-based  structures  which  can  support;  (i)  the  direct  linkage  with 
company  CAD  information,  (ii)  all  forms  of  adaptivity  without  introducing  ge¬ 
ometric  approximation  errors,  and  (iii)  the  high  level  integration  of  multiscale, 
multi-physics  analysis  methodologies. 

•  A  careful  decomposition  of  the  geometry,  physics,  mathematical  model,  discretiza¬ 
tion  and  numerical  methods  into  interacting  classes.  These  structures  support  a 
variety  of  equation  discretization  methods.  Both  finite  element  [6,  5]  and  partition 
of  unity  methods  have  been  implemented  [25,  24]. 

•  Adaptive  control  of  each  step  of  the  simulation  process  from  the  selection  of  the 
mathematical  model,  through  the  model  and  domain  discretization,  to  the  selection 
of  application  of  the  numerical  methods  to  solving  the  discrete  system. 

Conceptually  Trellis  is  built  on  the  view  of  an  analysis  as  a  transformation  between 
three  levels  of  description.  The  highest  level  description  is  that  of  the  physical  problem 
which  is  posed  in  terms  of  physical  objects  interacting  with  their  environment.  Since  the 
goal  of  the  analysis  is  to  obtain  reliable  estimates  of  the  response  of  the  system  the  second 
level  is  a  mathematical  problem  description  that  introduces  some  level  of  idealization, 
which  also  needs  to  be  controlled  to  yield  the  desired  accuracy.  The  third  level  is  the 
numerical  discretization  constructed  from  a  mathematical  problem  that  involves  another 
set  of  idealizations  which  also  need  to  be  controlled. 

The  structures  used  to  support  the  problem  definition,  the  discretizations  of  the  model 
and  their  interactions  are  central  to  Trellis.  The  two  structures  of  the  geometric  model 
and  attributes  are  used  to  house  the  problem  definition.  The  analysis  discretizations  are 
housed  in  the  mesh  structure.  The  final  structure  is  the  field  structure  which  houses 
numerical  solution  results. 

The  geometric  model  representation  is  a  non-manifold  boundary  representation  based. 
The  representation  used  for  a  mesh  is  similar  to  that  used  for  a  geometric  model:  a  hi¬ 
erarchy  of  regions,  faces,  edges  and  vertices.  In  addition,  each  mesh  entity  maintains  a 
relation,  called  classification,  to  the  model  entity  that  it  was  created  to  partially  repre¬ 
sent.  Understanding  how  the  mesh  relates  to  the  geometric  model  is  critical  for  both 
mesh  adaptivity  and  understanding  how  the  solution  relates  back  to  the  original  prob¬ 
lem  description.  The  topological  representation  can  also  be  used  to  great  advantage  in 
performing  adaptive  p- version  analyses  as  polynomial  orders  can  be  directly  assigned  to 
the  various  entities. 
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A  problem  with  many  “classic”  numerical  analysis  codes  is  that  the  solution  of  an 
analysis  is  given  in  terms  of  the  values  at  a  set  of  discrete  points.  Trellis  eliminates  this 
problem  by  introducing  a  construct  known  as  a  field  which  describes  the  variation  of  a 
tensor  over  one  or  more  entities  in  a  geometric  model.  The  spatial  variation  of  the  field  is 
defined  in  terms  of  interpolations  defined  over  a  discrete  representation  of  the  geometric 
model  entities,  which  can  be  a  mesh. 

The  Trellis  analysis  process  is  a  series  of  transformations  of  the  problem  from  the 
original  mathematical  problem  description  through  to  sets  of  algebraic  equations  ap¬ 
proximately  representing  the  problem.  The  mathematical  problem  description  level  is 
described  by  a  ContinuousSystem  class,  which  contains  the  geometric  model  and  the  at¬ 
tributes  which  apply  to  that  model,  specified  by  a  particular  case  node  in  the  attribute 
graph.  An  instance  of  a  ContinuousSystem  is  then  transformed  to  an  instance  of  the  class 
DiscreteSystem  which  represents  the  discretized  version  of  the  model  and  attributes  and 
the  weak  form  of  the  partial  differential  equation  (PDE).  The  particular  analysis  class 
that  is  used  depends  on  the  selected  weak  form  of  the  PDE  to  be  solved. 

The  DiscreteSystem  class  represents  the  problem  in  terms  of  contributions  from  a  set 
of  objects  that  live  on  the  discrete  representation  of  the  model.  These  objects  are  called 
SystemContributors.  There  are  three  types  of  SystemContributors:  StiffnessContributors 
contribute  coupling  terms  between  degrees  of  freedom  of  the  system,  ForceContributors 
contribute  terms  to  the  right  hand  side  vector,  and  Constraints  set  specific  values  or 
constraints  to  given  degrees  of  freedom.  These  objects  are  created  by  the  Analysis  object 
and  correspond  to  an  interpretation  of  attributes  consistent  with  the  weak  form  that  the 
Analysis  implements. 

The  Analysis  class  creates  all  of  the  SystemContributors  and  adds  them  to  an  instance 
of  a  DiscreteSystem.  The  DiscreteSystem  is  transformed  into  an  AlgebraicSystem,  an 
Assembler  object.  Multiple  linear  solvers  can  be  used  to  solve  the  AlgebraicSystem.  The 
most  extensive  capability  included  is  the  Portable,  Extensible  Toolkit  for  Scientific  Com¬ 
putation  (PETSc)  from  Argonne  National  Laboratory.  These  procedures  have  the  dual 
advantage  of  working  effectively  in  an  object-oriented  analysis  framework  and  providing 
an  efficient  set  of  linear  algebra  routines. 

Trellis  is  not  complete,  but  is  being  used  to  address  complex  problems  involving  com¬ 
pressible  flows  [20,  18,  19,  17,  16]  and  other  problems.  Steady  and  transient  compressible 
flow  problems  may  be  solved  by  a  Galerkin  space-time  finite  element  formulation  [30]  with 
a  least  squares  stabilization.  Compressible  flow  problems  with  more  transient  effects  are 
solved  by  a  discontinuous  Galerkin  method  with  explicit  time  integration  and  local  time 
stepping  [18].  This  method  is  proving  to  be  very  efficient  since  small  time  steps  are 
restricted  to  regions  containing  shocks,  expansions,  and  other  nonuniformities.  Three- 
dimensional  acoustics  problems  [28,  34]  are  treated  by  high-order  methods  with  h-  and 
p-refinement.  The  same  base  software,  including  the  adaptive  and  parallel  procedures, 
can  handle  these  diverse  applications. 

Example  1.  We  illustrate  some  capabilities  of  the  Trellis  framework  by  using  the  ex¬ 
plicit  discontinuous  Galerkin  software  [18,  20]  to  address  the  three-dimensional  unsteady 
flow  of  a  compressible  gas  in  a  circular  cylinder  that  contains  a  cylindrical  vent.  This 
problem  was  motivated  by  shock  tube  studies  as  part  of  an  investigation  of  perforated 
muzzle  brakes  for  large-calibre  cannons.  Our  focus  is  on  the  quasi-stationary  flow  that 
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exists  behind  the  contact  surface  (diaphragm  of  the  shock  tube)  for  a  short  time.  Thus, 
we  initiate  the  problem  with  a  Mach  1.23  flow  of  helium  in  the  main  tube  and  a  quiet  flow 
in  the  vent.  A  hypothetical  diaphragm  between  the  vent  and  the  main  tube  is  ruptured 
at  time  zero  to  direct  the  flow  into  the  vent.  Using  symmetry  to  divide  the  problem  in 
half,  we  calculate  solutions  on  16  processors  of  an  IBM  SP2  at  the  Rensselaer  Polytech¬ 
nic  Institute  using  local  h-refinement  with  an  initial  mesh  of  80,659  tetrahedral  elements. 
Mesh  partitioning  was  done  by  a  parallel  octree  traversal  at  each  adaptive  h-refinement 
step. 

In  Figure  1,  we  show  projections  of  the  Mach  number  and  velocity  vectors  (left)  and 
of  the  the  mesh  and  partitioning  used  onto  the  symmetry  plane  and  cylindrical  surfaces 
near  the  vent  at  three  (dimensionless)  times. 

The  flow  accelerates  as  it  enters  the  vent.  A  strong  shock  forms  near  the  downwind 
vent-shock  tube  interface  and  a  portion  of  the  flow  in  the  vent  accelerates  to  supersonic 
speeds.  The  reflection  of  the  flow  from  the  downwind  vent  face  produces  a  component 
of  the  flow  at  the  vent  exit  in  a  direction  opposite  to  the  principal  flow  direction.  In 
a  cannon,  this  reduces  recoil.  Flow  features  compare  well  with  experiments  and  earlier 
computational  results.  The  mesh  is  concentrated  in  the  shock  and  expansion  regions  and 
remains  so  as  these  features  evolve.  Likewise,  an  initially  (quasi)  uniform  partitioning 
is  adjusted  to  contain  the  smaller  elements  formed  in  the  shock  and  expansion  regions. 
Variable  time  steps  that  are  smaller  on  the  smaller  elements  of  the  mesh  are  not  visible  in 
the  figure.  This  three-dimensional  unsteady  problem  would  be  difficult  to  solve  without 
such  adaptivity. 

2.2  Dynamic  Load  Balancing 

We  developed  three  dynamic  load  balancing  schemes:  iterative  tree  balancing  [20,  31,  30], 
parallel  sort  inertial  bisection  (PSIRB)  [31,  30],  and  octree  partitioning  (OCTPART) 
[20,  18,  21].  The  first  is  iterative  (incremental)  and  the  latter  two  are  direct  (global). 
All  execute  in  parallel  on  a  spatially-distributed  mesh  as  part  of  the  adaptive  system. 
Performance  of  all  load  balancing  procedures  can  be  enhanced  by  partition  boundary 
smoothing,  predictive  load  balancing,  and  weighting.  Weighting  accounts  for  the  com¬ 
plexities  of  adaptive  p-refinement,  local  time  stepping,  and  heterogeneous  computing 
environments.  Partition  boundary  smoothing  [20]  eliminates  elements  that  protrude  into 
another  partition;  thus,  unnecessarily  increasing  the  number  of  element  faces  on  partition 
boundaries.  These  can  be  reduced  by  a  single  boundary  traversal,  which  is  both  inexpen¬ 
sive  and  effective.  Predictive  load  balancing  uses  the  enrichment  schedule  to  anticipate 
and  correct  an  imbalance  before  element  migration  proceeds  [20,  18,  19,  21].  It  provides 
a  better  balance  during  migration,  saves  time  by,  typically,  working  on  a  coarser  mesh 
prior  to  refinement,  and  often  avoids  the  need  for  rebalancing  during  the  subsequent  com¬ 
putational  step.  The  procedure  has  been  able  to  reduce  rebalancing  times  by  as  much  as 
35%  and  total  computational  time  by  as  much  as  20%  [20]. 
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2.3  Parallel  Data  Management 

We  are  developing  a  system  called  the  Rensselaer  Partition  Model  (RPM)  to  manage 
heterogeneous  computation  [33].  The  system  is  hierarchical  with  a  Partition  Model  de¬ 
scribing  how  the  domain  is  divided  for  computation.  Each  mesh  entity  is  classified  on 
a  single  partition  model  entity.  The  partition  model  is  a  topological  construct  similar 
in  concept  to  the  boundary  representation  used  to  represent  the  domain  geometry  and 
mesh.  The  Process  Model  is  a  disconnected  set  of  nodes  with  each  node  representing  a 
single  process  or  thread  of  execution.  Each  process  model  entity  can  control  one  or  more 
partition  model  entities.  The  Machine  Model  describes  the  computational  environment 
where  the  process  is  running.  The  model  is  a  graph  that  describes  the  various  hierar¬ 
chies  of  the  computational  environment.  Each  terminal  node  of  the  graph  represents 
a  single  CPU  that  has  certain  properties  such  as  available  memory  and  computational 
power.  Higher  nodes  in  the  graph  represent  the  grouping  of  processors  into  SMP  nodes  or 
computer  clusters.  Each  processor  can  communicate  with  any  other  processor;  however, 
preferential  communication  paths  are  represented  in  the  graph. 

2.4  A  Posteriori  Error  Estimation 

Our  concentration  has  been  on  developing  a  posteriori  error  estimates  for  singularly- 
perturbed  reaction-  and  convection-diffusion  problems.  We  have  developed  very  simple 
estimates  that  use  an  odd-even  dichotomy  principle  of  Babuska.  The  errors  of  odd-order 
finite  element  approximations  arise  near  element  boundaries  and  may  be  computed  from 
jumps  in  solution  gradients  at  element  vertices.  Conversely,  the  errors  of  even-order 
approximations  arise  in  element  interiors  and  may  be  computed  by  solving  element-level 
finite  element  problems  using  “bubble  functions”  that  vanish  on  element  boundaries 
[3].  Error  estimates  computed  in  this  manner  are  asymptotically  correct  on  rectangular 
elements.  Evidence  suggests  that  they  are  also  correct  for  triangular  and  quadrilateral 
[2]  elements.  Error  estimates  developed  for  product  spaces  [3]  have  been  extended  to 
modified  hierarchical  spaces  [2]. 

We  have  showed  that  the  spatial  discretization  errors  of  hyperbolic  problems  could 
be  estimated  by  using  Radau  polynomials.  This  was  based  on  earlier  work  of  Adjerid  et 
al.  [1]  who  showed  that  the  Radau  points  are  superconvergence  points  for  convection- 
diffusion  problems  in  the  limit  of  vanishing  diffusion.  These  results  provide  simple  error 
estimation  procedures  for  hyperbolic  problems. 


2.5  Solution  Procedures 

We  have  continued  to  investigate  preconditioning  techniques  for  large,  sparse,  linear 
systems.  In  doing  this,  our  efforts  have  focused  on  the  major  area  of  applying  the 
AMLI  preconditioner  in  parallel.  We  have  generated  and  tested  a  preliminary  version 
of  a  parallel,  p-level  (polynomial-level)  preconditioner  and  have  demonstrated  its  utility 
in  parallel  [34].  We  are  working  on  performance  and  scalability  improvements  to  this 
algorithm. 
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Figure  1:  Projections  of  the  Mach  number  and  velocity  vectors  (left)  and  mesh  and 
partitioning  (right)  onto  the  surfaces  of  a  perforated  cylinder  at  times  0,  0.3,  and  0.6. 
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